t-Test Assumptions
Wilcoxon Rank Sum
Wilcoxon Signed Rank

June 26, 2025
Thursday

Introduction: Topics

  • Assumptions on t-tests:
    • One-sample
      • Normality
    • Independent samples
      • Normality
      • Equal variance
    • Dependent samples
      • Normality
  • Nonparametric alternatives
    • Independent medians (M_1-M_2)
    • Dependent medians (M_d)

Introduction: Normality Assumption

  • All t-tests assume approximate normality of the data.

    • In the case of one-sample t-tests, the measure of interest must somewhat follow a normal distribution.

    • In the case of two-sample t-tests, the measure of interest in each group must somewhat follow a normal distribution.

  • Note that a paired t-test is technically a one-sample t-test, so we will examine normality of the difference.

Normality Assumption: Quantile-Quantile Plots

  • There are formal tests for normality (see article here), however, we will not use them.

    • Tests for normality are not well-endorsed by statisticians.
  • Instead, we will assess normality using a quantile-quantile (Q-Q) plot.

  • A Q-Q plot helps us visually check if our data follows a specific distribution (here, the normal).

    • It compares the quantiles of our sample data to the quantiles of a theoretical distribution (the normal).
  • How do we read Q-Q plots?

    • Each dot represents one observation in our dataset.
    • If the data follow a normal distribution, we will observe that the dots fall roughly along a straight diagonal line.
    • We focus on the “middle” of the graph.

Normality Assumption: Quantile-Quantile Plots

Normality Assumption: Quantile-Quantile Plots

Normality Assumption: Quantile-Quantile Plots

Normality Assumption: Quantile-Quantile Plots

Normality Assumption: Independent Means

  • Recall our example from last lecture: In the skies above Cloudsdale, Pegasus trainers believe that an average healthy Pegasus flaps its wings 50 flaps per minute when cruising. To see if today’s young Pegasi conform to that standard, a researcher samples 25 Pegasi at the Cloudsdale Training Grounds and measures each pony’s wing‐flap rate (in flaps/minute).

Normality Assumption: Independent Means

  • Further, we performed a two-sample t-test to determine if the above target pegasi are eating 5 or more apples than the below target pegasi (\alpha=0.05).
wing_flap %>% independent_mean_HT(grouping = target,
                                  continuous = apples, 
                                  mu = 5, 
                                  alternative = "greater", 
                                  alpha = 0.05)
Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ = 5
Alternative: H₁: μ₁ − μ₂ > 5
Test statistic: t(23) = 5.445
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)
  • Are our results valid?

Normality Assumption: Independent Means (R)

  • We will use the independent_qq() function from library(ssstats) to assess normality.
dataset_name %>% independent_qq(continuous = continuous_variable,
                                grouping = grouping_variable)
  • This will provide the the Q-Q plots and the histograms for the two independent groups under consideration.

Normality Assumption: Independent Means

  • Let’s now look at the normality assumption for our example.

  • How should we change the code for our dataset?

dataset_name %>% independent_qq(continuous = continuous_variable,
                                grouping = grouping_variable)

Normality Assumption: Independent Means

  • Let’s now look at the normality assumption for our example.

  • How should we change the code for our dataset?

wing_flap %>% independent_qq(continuous = apples,
                             grouping = target)

Normality Assumption: Independent Means

  • Running the code,
wing_flap %>% independent_qq(continuous = apples,
                             grouping = target)

Introduction: Variance Assumption

  • introduce variance assumption

Variance Assumption: Independent Means (R)

  • syntax for folded F

Variance Assumption: Independent Means

  • example set up

Variance Assumption: Independent Means

  • example set up with correct code

Variance Assumption: Independent Means

  • code has been run

Variance Assumption: Independent Means

  • HT results

Normality Assumption: Dependent Means

  • normality assumption for dependent t

Normality Assumption: Dependent Means (R)

  • syntax for qq

Normality Assumption: Dependent Means

  • example set up

Normality Assumption: Dependent Means

  • example set up with correct code

Normality Assumption: Dependent Means

  • code has been run

Wrap Up: t-Test Assumptions

  • Important note!!
    • I do not expect you to agree with my assessment of q-q plots!
    • What I do expect is that you know what to do after making your assessment.
      • Meet assumption \to use t-test
      • Do not meet assumption \to use nonparametric equivalent
  • Next: nonparametrics!

Introduction: Nonparametrics

  • The t-tests we have already learned are considered parametric methods.

    • There is a distributional assumption on the test.
  • Nonparametric methods do not have distributional assumptions.

    • We typically transform the data to their ranks and then perform calculations.
  • Why don’t we always use nonparametric methods?

    • They are often less efficient: a larger sample size is required to achieve the same probability of a Type I error.

    • They discard useful information :(

Introduction: Wilcoxon Rank Sum

  • introduce W RS

Wilcoxon Rank Sum: Ranking Data

  • In the nonparametric tests we will be learning, the data will be ranked.

  • Let us first consider a simple example, x: \ 1, 7, 10, 2, 6, 8

  • Our first step is to reorder the data: x: \ 1, 2, 6, 7, 8, 10

  • Then, we replace with the ranks: R: \ 1, 2, 3, 4, 5, 6

Wilcoxon Rank Sum: Ranking Data

  • What if all data values are not unique? We will assign the average rank for that group.

  • For example, x: \ 9, 8, 8, 0, 3, 4, 4, 8

  • Let’s reorder:x: \ 0, 3, 4, 4, 8, 8, 8, 9

  • Rank ignoring ties:R: \ 1, 2, 3, 4, 5, 6, 7, 8

  • Now, the final rank:R: \ 1, 2, 3.5, 3.5, 6, 6, 6, 8

Wilcoxon Rank Sum

Hypotheses

  • H_0: M_1 - M_2 = M_0 | H_0: M_1 - M_2 \le M_0 | H_0: M_1 - M_2 \ge M_0
  • H_1: M_1 - M_2 \ne M_0 | H_1: M_1 - M_2 > M_0 | H_1: M_1 - M_2 < M_0

Test Statistic & p-Value

  • T = \sum R_{\text{sample 1}} - \frac{n_1(n_1+1)}{2}
  • p = (calculated by R :))

Rejection Region

  • Reject H_0 if p < \alpha.

Conclusion/Interpretation

  • [Reject or fail to reject] H_0.

  • There [is or is not] sufficient evidence to suggest [alternative hypothesis in words].

Wilcoxon Rank Sum (R)

  • syntax

Wilcoxon Rank Sum: Independent Medians

  • example set up

Wilcoxon Rank Sum: Independent Medians

  • example with correct code

Wilcoxon Rank Sum: Independent Medians

  • code has been run

Wilcoxon Rank Sum: Independent Medians

  • HT typeset

Introduction: Wilcoxon Signed Rank

Wilcoxon Signed Rank: Ranking Data

  • Before ranking, we will find the difference between the paired observations and eliminate any 0 differences.

    • Note 1: elimniating 0 differences is the big difference between the other tests!

    • Note 2: because we are eliminating 0 differences, this means that our sample size will update to the number of pairs with a non-0 difference.

  • When ranking, we the differences are ranked based on the absolute value of the difference.

  • We also keep the sign of the difference.

    • We will have positive ranks and negative ranks.
X Y D |D| Rank
5 8 -3 3 - 1.5
8 5 3 3 + 1.5
4 4 0 0 ———

Wilcoxon Signed Rank

Hypotheses

  • H_0: M_d = M_0 | H_0: M_d \le M_0 | H_0: M_d \ge M_0
  • H_1: M_d \ne M_0 | H_1: M_d > M_0 | H_1: M_d < M_0

Test Statistic & p-Value

  • T_0 = \min(T+,|T_-|) if two-tailed, T_0 = T_+ if left-tailed, and T_0 = |T_-| if right-tailed.
  • p = (calculated by R :))

Rejection Region

  • Reject H_0 if p < \alpha.

Conclusion/Interpretation

  • [Reject or fail to reject] H_0.

  • There [is or is not] sufficient evidence to suggest [alternative hypothesis in words].

Wilcoxon Signed Rank (R)

  • syntax

Wilcoxon Signed Rank: Dependent Medians

  • example set up

Wilcoxon Signed Rank: Dependent Medians

  • example with correct code

Wilcoxon Signed Rank: Dependent Medians

  • code has been run

Wilcoxon Signed Rank: Dependent Medians

  • HT typeset

Putting It All Together

  • When asked to compare two groups, I first must decide:
    • Independent data?
    • Dependent data?
  • Then, I must decide:
    • Do I meet the assumptions for the appropriate t-test?
      • If so \to proceed with t-test.
      • If not \to use nonparametric alternative.
  • Remember the pairings!
    • Independent t-test \to Wilcoxon rank sum.
    • Dependent t-test \to Wilcoxon signed rank.

“Use the Appropriate Test”: Independent or Dependent?

  • Example 1 - independent or dependent?

“Use the Appropriate Test”: t or Nonparametric?

  • Example 1 - parametric or non?

“Use the Appropriate Test”: Independent or Dependent?

  • Example 2 - independent or dependent?

“Use the Appropriate Test”: t or Nonparametric?

  • Example 2 - parametric or non?

“Use the Appropriate Test”: Independent or Dependent?

  • Example 3 - independent or dependent?

“Use the Appropriate Test”: t or Nonparametric?

  • Example 3 - parametric or non?

“Use the Appropriate Test”: Independent or Dependent?

  • Example 4 - independent or dependent?

“Use the Appropriate Test”: t or Nonparametric?

  • Example 4 - parametric or non?

Wrap Up

  • Today’s lecture:
    • Normality assumption (all t) \to Q-Q plot
    • Variance assumption (independent t) \to folded F test
    • Wilcoxon rank sum (nonparametric equivalent to independent t)
    • Wilcoxon signed rank (nonparametric equivalent to dependent t)
  • Next class:
    • Project 1!

Wrap Up

  • Daily activity: .qmd is available on Canvas.
    • Due date: Monday, June 30, 2025.
  • You will upload the resulting .html file on Canvas.
    • Please refer to the help guide on the Biostat website if you need help with submission.
  • Housekeeping:
    • Are you in the Discord server?
    • Do you have questions for me?
    • Do you need my help with anything from prior lectures?